Limitations of Learning Via Embeddings
نویسنده
چکیده
This paper considers the embeddability of general concept classes in Euclidean half spaces. By embedding in half spaces we refer to a mapping from some concept class to half spaces so that the labeling given to points in the instance space is retained. The existence of an embedding for some class may be used to learn it using an algorithm for the class it is embedded into. The Support Vector Machines paradigm employs this idea for the construction of a general learning system. We show that an overwhelming majority of the family of nite concept classes of constant VC dimension d cannot be embedded in low-dimensional half spaces. (In fact, we show that the Euclidean dimension must be almost as high as the size of the instance space.) We strengthen this result even further by showing that an overwhelming majority of the family of nite concept classes of constant VC dimension d cannot be embedded in half spaces (of arbitrarily high Euclidean dimension) with a large margin. (In fact, the margin cannot be substantially larger than the margin achieved by the trivial embedding.) Furtehrmore, this bounds are robust in the sense that allowing each image half space to err on a small fraction of the instances does not imply a signiicant weakening of these dimension and margin bounds. 0 Our results indicate that any universal learning machine, which transforms data into the Euclidean space and then applies linear (or large margin) classiication, cannot enjoy any meaningful generalization guarantees that are based on either VC dimension or margins considerations.
منابع مشابه
Improving Distributed Representation of Word Sense via WordNet Gloss Composition and Context Clustering
In recent years, there has been an increasing interest in learning a distributed representation of word sense. Traditional context clustering based models usually require careful tuning of model parameters, and typically perform worse on infrequent word senses. This paper presents a novel approach which addresses these limitations by first initializing the word sense embeddings through learning...
متن کاملOnline Learning of Interpretable Word Embeddings
Word embeddings encode semantic meanings of words into low-dimension word vectors. In most word embeddings, one cannot interpret the meanings of specific dimensions of those word vectors. Nonnegative matrix factorization (NMF) has been proposed to learn interpretable word embeddings via non-negative constraints. However, NMF methods suffer from scale and memory issue because they have to mainta...
متن کاملLearning Lexical Embeddings with Syntactic and Lexicographic Knowledge
We propose two improvements on lexical association used in embedding learning: factorizing individual dependency relations and using lexicographic knowledge from monolingual dictionaries. Both proposals provide low-entropy lexical cooccurrence information, and are empirically shown to improve embedding learning by performing notably better than several popular embedding models in similarity tas...
متن کاملSyntactico Semantic Word Representations in Multiple Languages
Our project is an extension of the project “Syntactico Semantic Word Representations in Multiple Languages”[1]. The previous project aims to improve the semantical representation of English vocabulary via incorporating the local context with global context and supplying homonymy and polysemy for multiple embeddings per word. It also introduces a new neural network architecture that learns the w...
متن کاملSentiment Analysis by Joint Learning of Word Embeddings and Classifier
Word embeddings are representations of individual words of a text document in a vector space and they are often useful for performing natural language processing tasks. Current state of the art algorithms for learning word embeddings learn vector representations from large corpora of text documents in an unsupervised fashion. This paper introduces SWESA (Supervised Word Embeddings for Sentiment...
متن کاملDeep Multilingual Correlation for Improved Word Embeddings
Word embeddings have been found useful for many NLP tasks, including part-of-speech tagging, named entity recognition, and parsing. Adding multilingual context when learning embeddings can improve their quality, for example via canonical correlation analysis (CCA) on embeddings from two languages. In this paper, we extend this idea to learn deep non-linear transformations of word embeddings of ...
متن کامل